Largeqvalue: a Program for Calculating Fdr Estimates with Large Datasets
نویسنده
چکیده
This is an implementation of the R statistical software qvalue package [Dabney et al., 2014], designed for use with large datasets where memory or computation time is limiting. In addition to estimating p values adjusted for multiple testing, the software outputs a script which can be pasted into R to produce diagnostic plots and report parameter estimates. This program runs almost 30 times faster and requests substantially less memory than the qvalue package when analysing 10 million p values on a high performance cluster. The software has been used to control for the multiple testing of 390 million tests when analysing a full cis scan of RNA-seq exon level gene expression from the Eurobats project [Brown et al., 2014]. The source code and links to executable files for linux and Mac OSX can be found here: https://github.com/abrown25/qvalue. Help for the package can be found by running ./largeQvalue --help.
منابع مشابه
Decoy-free protein-level false discovery rate estimation
MOTIVATION Statistical validation of protein identifications is an important issue in shotgun proteomics. The false discovery rate (FDR) is a powerful statistical tool for evaluating the protein identification result. Several research efforts have been made for FDR estimation at the protein level. However, there are still certain drawbacks in the existing FDR estimation methods based on the tar...
متن کاملFDR-control in multiscale change-point segmentation
Fast multiple change-point segmentation methods, which additionally provide faithful statistical statements on the number, locations and sizes of the segments, have recently received great attention. In this paper, we propose a multiscale segmentation method, FDRSeg, which controls the false discovery rate (FDR) in the sense that the number of false jumps is bounded linearly by the number of tr...
متن کاملA windowed local fdr estimator providing higher resolution and robust thresholds
Motivation: In microarray analysis, special consideration must be given to the issues of multiple statistical tests and typically p-values are adjusted to control family-wise error rate (FWER) or false discovery rate (FDR). FDR metrics have been suggested for controlling false positives, however, genes with p-values close to the threshold typically have a higher chance of being false positives ...
متن کاملNonlinear fitting method for determining local false discovery rates from decoy database searches.
False discovery rate (FDR) analyses of protein and peptide identification results using decoy database searching conventionally report aggregate or global FDRs for a whole set of identifications, which are often not very informative about the error rates of individual members in the set. We describe a nonlinear curve fitting method for calculating the local FDR, which estimates the chance that ...
متن کاملUnbiased scalable softmax optimization
Recent neural network and language models rely on softmax distributions with an extremely large number of categories. Since calculating the softmax normalizing constant in this context is prohibitively expensive, there is a growing literature of efficiently computable but biased estimates of the softmax. In this paper we propose the first unbiased algorithms for maximizing the softmax likelihoo...
متن کامل